Skip to content

fix(setup): durable retry for external dep downloads (Aaron 2026-04-29)#804

Merged
AceHack merged 2 commits intomainfrom
durable-retry-fix-aaron-2026-04-29
Apr 29, 2026
Merged

fix(setup): durable retry for external dep downloads (Aaron 2026-04-29)#804
AceHack merged 2 commits intomainfrom
durable-retry-fix-aaron-2026-04-29

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 29, 2026

Summary

Aaron's mid-tick correction during autonomous-loop tick 05:50Z, verbatim (typos preserved per the verbatim-preservation rule):

"we can retury on external dependency download failures, this goes against DST but we have not choice they are external dependencies we need. Next time instead of kicking a 2nd build we should fix it and reduce friction for future builds."

Earlier in this same session I recovered an elan-toolchain 502 by running gh run rerun --failed. Aaron's correction names that as the wrong fix LOCATION:

  • gh run rerun --failed only papers over THIS build; same flake fails next time.
  • Durable retry inside the code (curl_fetch --retry 5 in tools/setup/) absorbs future flakes too.

Changes

  • memory/feedback_external_dependency_download_retries_durable_fix_not_ephemeral_rerun_aaron_2026_04_29.md (new) — rule for future-Claude: external dep downloads ARE the DST exception class; fix LOCATION matters; when CI hits a transient external-dep failure, FIRST check whether the call site uses curl_fetch; if not, THAT is the durable fix.
  • tools/setup/linux.sh — sources curl-fetch.sh; replaces raw curl -fsSL for the mise tarball download with curl_fetch.
  • tools/setup/common/elan.sh — adds REPO_ROOT detection + sources curl-fetch.sh (elan.sh runs as a subprocess from linux.sh + macos.sh — sourced helpers don't propagate to subprocess shells); replaces raw curl -fsSL for the elan-init.sh download with curl_fetch.
  • memory/MEMORY.md — paired-edit pointer row per index-integrity rule.

Composes with

  • memory/feedback_retries_are_non_determinism_smell_DST_holds_investigate_first_2026_04_23.md — parent DST rule; this entry refines the "external uncontrollable" exception with concrete domain + fix-location discipline.
  • tools/setup/common/curl-fetch.sh — existing retry-equipped helper (per Aaron 2026-04-28 framing). This work migrates the call sites that were bypassing it.

Test plan

  • bash -n syntax check on all 3 modified shell scripts (linux.sh / elan.sh / curl-fetch.sh)
  • Verbatim quote preserved (typos, double-space, all)
  • MEMORY.md updated paired-edit
  • CI build passes (validates the curl_fetch wiring on linux runner)

🤖 Generated with Claude Code

Aaron's mid-tick correction (verbatim, typos preserved):

  "we can retury on external dependency download failures,
   this goes against DST but we have not choice they are
   external dependencies we need.  Next time instead of
   kicking a 2nd build we should fix it and reduce friction
   for future builds."

Earlier in the session I recovered an elan-toolchain 502 by
running `gh run rerun --failed`. Aaron's correction names that
as the wrong fix LOCATION: the rerun made THIS build pass but
did nothing for FUTURE builds hitting the same upstream blip.

The durable fix lives in the code: tools/setup/common/curl-fetch.sh
already provides curl_fetch (file-output, --retry 5
--retry-delay 2 --retry-all-errors) per Aaron's 2026-04-28
framing. Two scripts bypassed it with raw `curl -fsSL`:
  - tools/setup/linux.sh:87  (mise tarball download)
  - tools/setup/common/elan.sh:29  (elan-init.sh download)

Both migrated to curl_fetch. linux.sh adds the source line;
elan.sh runs as a subprocess from linux.sh + macos.sh so it
must source curl-fetch.sh itself (a sourced helper in the
parent shell does NOT propagate to subprocess shells — adds
the REPO_ROOT detection + source).

The new memory file
  memory/feedback_external_dependency_download_retries_durable_fix_not_ephemeral_rerun_aaron_2026_04_29.md
captures the rule for future-Claude:
  - external dep downloads ARE the DST exception class
  - fix LOCATION matters: durable in code > ephemeral rerun
  - when CI hits a transient external-dep failure, FIRST
    check whether the call site uses curl_fetch (or
    equivalent retry-equipped helper); if not, THAT is the
    durable fix; only fall back to rerun for genuine
    one-shot blips that retry-equipped code already
    exhausted

Composes with:
  - memory/feedback_retries_are_non_determinism_smell_DST_holds_investigate_first_2026_04_23.md
    (parent DST rule; this entry refines the
    "external uncontrollable" exception with concrete domain
    + fix-location discipline)
  - tools/setup/common/curl-fetch.sh (existing helper; this
    work migrates the call sites that were bypassing it)

MEMORY.md updated paired-edit per the index-integrity rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes external dependency downloads in the setup scripts more resilient by routing them through the existing retry-capable curl_fetch helper, and records the operational rule that retries are acceptable at the external-dependency boundary when implemented durably in-code (not via workflow reruns).

Changes:

  • Source tools/setup/common/curl-fetch.sh in Linux setup and use curl_fetch for the mise tarball download.
  • Make elan.sh self-sufficient (repo-root detection + sourcing curl-fetch.sh) and use curl_fetch for elan-init.sh.
  • Add a new memory entry documenting “durable retry in code” vs “ephemeral rerun” discipline and index it in memory/MEMORY.md.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
tools/setup/linux.sh Sources curl-fetch.sh and replaces a raw curl -fsSL download with curl_fetch.
tools/setup/common/elan.sh Adds repo-root detection + sources curl-fetch.sh locally; replaces raw curl with curl_fetch.
memory/feedback_external_dependency_download_retries_durable_fix_not_ephemeral_rerun_aaron_2026_04_29.md New memory/rule writeup describing the “durable retry in code” exception and fix-location discipline.
memory/MEMORY.md Adds the new memory entry to the index.

Comment thread tools/setup/linux.sh Outdated
Comment thread tools/setup/linux.sh Outdated
Comment thread tools/setup/common/elan.sh Outdated
…ry-flag qualifier + line-number drift)

Four Copilot findings on PR #804 addressed:

P1 (linux.sh): named-attribution "Aaron 2026-04-28/29" removed
  from script comments (current-state code surface uses role-refs;
  names go on history surfaces per docs/AGENT-BEST-PRACTICES.md
  §named-attribution-carve-out). Reframed as "DST exception" + "a
  workflow rerun" without naming.

P1 (elan.sh): same named-attribution rewrite.

P2 (linux.sh): comment overstated curl_fetch's retry-flag set as
  unconditional. Clarified that --retry-all-errors is added when
  the local curl supports it (curl-fetch.sh feature-detects
  via `_curl_fetch_supports_retry_all_errors`).

P1 (memory file): hardcoded line numbers (elan.sh:29, linux.sh:87)
  drift with each edit. Replaced with stable anchors (URL constant
  names + grep instructions). Diff illustration kept but caveated.

History-surface attribution (memory file body, commit messages,
PR descriptions) preserves the named-channel quote per the
verbatim-preservation rule. Code-surface comments now use
role-refs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AceHack AceHack merged commit ae1f2d9 into main Apr 29, 2026
27 checks passed
@AceHack AceHack deleted the durable-retry-fix-aaron-2026-04-29 branch April 29, 2026 06:10
AceHack added a commit that referenced this pull request Apr 29, 2026
…y correction (#804) (#805)

* chore(loop-tick-history): tick 05:50Z — drain (3 PRs) + Aaron mid-tick durable-retry correction (#804)

3 CLEAN PRs squash-merged (#801/#802/#803). Aaron's /btw aside
answered (EAT document location). Mid-tick correction caught the
elan-toolchain-502 rerun anti-pattern from earlier this session;
durable fix landed in PR #804 (memory file + linux.sh + elan.sh
migrated to curl_fetch). First non-tick-history substrate this
session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(loop-tick-history): tick 0550Z — Copilot P2 fixes (memory/ prefix + explicit backlog paths)

PR #805 review thread from copilot-pull-request-reviewer:
1. Cross-reference style: memory file path needs `memory/`
   prefix (consistent with other shards).
2. "pending task #307" reads ambiguously (looks like a GH PR
   number). Replace with explicit backlog row paths
   (B-0062 + B-0074 full paths).

Both fixes applied; row content unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 29, 2026
 pending CI (#807)

Two CLEAN PRs merged. PR #804 (durable-retry fix) is now on main —
future external-dep download flakes absorbed via curl_fetch retry.
PR #806 (multi-AI feedback absorb bundle) waiting on CI — real-
dependency wait, not manufactured patience.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 29, 2026
…+ tick 0558Z shard

The human maintainer forwarded a multi-AI synthesis packet during
autonomous-loop tick 05:58Z:
  - Deepseek's reassessment + 5-point pushback after correcting
    earlier-incorrect search results
  - Amara's filter-to-actionables (6 bounded items, with
    "rerun is incident recovery; retry/cache is substrate
    improvement" elevated as the load-bearing line)
  - reference to an older Gemini log on tele+port+leap operational
    resonance (already canonical at
    memory/feedback_operational_resonance_*.md; not re-absorbed)

Verbatim absorb (per the channel-verbatim-preservation rule)
landed at:
  docs/research/multi-ai-feedback-2026-04-29-deepseek-amara-on-loop-state.md
with §33 archive header (Scope / Attribution / Operational status /
Non-fusion disclaimer).

Four small P3 backlog rows file the bounded actionables (per
the maintainer's existing narrowing on task #309 — research-
grade only, no broad new substrate PRs):

  B-0098 — tick-ordinal-continuity lint (or remove ordinals
           from shards entirely; computed > narrated)
  B-0099 — PR-count claims as derived metrics, not narrated prose
  B-0100 — pure-wait tick backpressure / quiescence rule
  B-0101 — small 5-bucket reviewer-artifact classification table

The 5th actionable (external-dep retry/cache) is already
addressed by PR #804 (durable-retry fix landed alongside the
"rerun-is-recovery / retry-is-substrate-improvement" rule).
The 6th actionable (evidence-claim language tightening) is
operational discipline, captured in the tick shard observation
column.

Tick shard at docs/hygiene-history/ticks/2026/04/29/0558Z.md
captures the work-stream summary.

Pattern: ONE consolidated PR for the absorb bundle (research
note + 4 backlog rows + tick shard) rather than 6 separate PRs,
honoring the maintainer's "don't open a bunch of new PRs"
narrowing while still preserving the verbatim record durably.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 29, 2026
…age (B-0101 taxonomy applied) + rebase (#810)

(1) PR #808 squash-merged.
(2) #806's 5 unresolved threads classified per B-0101 taxonomy:
    - 1 DISPLAY_ARTIFACT (Copilot's "|| " excerpt was hallucinated;
      actual file has correct table)
    - 3 REVIEWER_SNAPSHOT_LAG (memory file IS on main post-#804 merge)
    - 1 REAL_DEFECT already fixed in prior tick
    All 5 resolved with explanatory comment.
(3) #806 branch 4 commits behind main → rebased + force-pushed;
    CI recomputing.

First operational use of the B-0101 taxonomy — one tick after
filing it. The pattern: file the rule, validate it on the next
class instance.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 29, 2026
…ck 0558Z (re-open of #806) (#811)

* absorb: multi-AI feedback packet (Deepseek + Amara) + 4 backlog rows + tick 0558Z shard

The human maintainer forwarded a multi-AI synthesis packet during
autonomous-loop tick 05:58Z:
  - Deepseek's reassessment + 5-point pushback after correcting
    earlier-incorrect search results
  - Amara's filter-to-actionables (6 bounded items, with
    "rerun is incident recovery; retry/cache is substrate
    improvement" elevated as the load-bearing line)
  - reference to an older Gemini log on tele+port+leap operational
    resonance (already canonical at
    memory/feedback_operational_resonance_*.md; not re-absorbed)

Verbatim absorb (per the channel-verbatim-preservation rule)
landed at:
  docs/research/multi-ai-feedback-2026-04-29-deepseek-amara-on-loop-state.md
with §33 archive header (Scope / Attribution / Operational status /
Non-fusion disclaimer).

Four small P3 backlog rows file the bounded actionables (per
the maintainer's existing narrowing on task #309 — research-
grade only, no broad new substrate PRs):

  B-0098 — tick-ordinal-continuity lint (or remove ordinals
           from shards entirely; computed > narrated)
  B-0099 — PR-count claims as derived metrics, not narrated prose
  B-0100 — pure-wait tick backpressure / quiescence rule
  B-0101 — small 5-bucket reviewer-artifact classification table

The 5th actionable (external-dep retry/cache) is already
addressed by PR #804 (durable-retry fix landed alongside the
"rerun-is-recovery / retry-is-substrate-improvement" rule).
The 6th actionable (evidence-claim language tightening) is
operational discipline, captured in the tick shard observation
column.

Tick shard at docs/hygiene-history/ticks/2026/04/29/0558Z.md
captures the work-stream summary.

Pattern: ONE consolidated PR for the absorb bundle (research
note + 4 backlog rows + tick shard) rather than 6 separate PRs,
honoring the maintainer's "don't open a bunch of new PRs"
narrowing while still preserving the verbatim record durably.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(B-0101): markdownlint MD022 — collapse split heading to single line

PR #806 markdownlint failure: heading was wrapped across two
`##` lines, which markdownlint reads as two separate headings
without blanks between them (MD022/blanks-around-headings).
Collapsed to single line.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(B-0098/B-0099): Copilot P1 — portable grep boundary + explicit gh-login placeholder

Two real defects on PR #811:

B-0098 — \b is non-portable in POSIX ERE (grep -E). On GNU/BSD
  grep \b is treated as backspace/undefined. Replaced with -w
  (whole-word match, supported on both GNU and BSD grep) and
  added a comment documenting why.

B-0099 — `@me` reads ambiguously in pseudocode (looks like a
  literal token even though it IS valid GitHub search syntax
  for the authenticated user). Replaced with explicit
  `<gh-login>` placeholder + a clarifying note that `@me` also
  works.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(B-0098/B-0099): convergent reviewer corrections (round 3)

Three external reviewers (Amara, Claude.ai, Deepseek) flagged
two precision issues on PR #811:

B-0098 — `grep -w` is GNU/BSD-common but not strictly POSIX-
  portable. Replace single-claim wording with two viable
  patterns: (a) `grep -woE` (GNU/BSD-common) and (b) strict
  portable explicit-boundary pattern. Implementing contributor
  picks based on portability priority.

B-0099 — `author:@me` inside `--search` reads ambiguously and
  is not the documented CLI shape. Replace with `gh pr list
  --author` CLI flag, with both `<your-gh-login>` (explicit,
  preferred for cold readability) and `@me` (valid CLI
  shorthand) shown.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(B-0101): split SNAPSHOT_MISMATCH into backward-stale + forward-dependent (Amara round-3+round-4)

Amara's correction: REVIEWER_SNAPSHOT_LAG was too broad —
covered both temporal directions. Split into:

  SNAPSHOT_MISMATCH (parent)
    ├─ BACKWARD_STALE_SNAPSHOT — reviewer behind reality
    └─ FORWARD_CROSS_PR_REFERENCE — PR references sibling
         work not yet on base; valid only IF merge order
         is enforced

Same family, different remedies. Backward = verify-and-
resolve. Forward = encode dependency + don't resolve as
"valid post-merge" unless mechanically enforced.

Distilled rule (Amara): "A forward reference is not wrong
if the dependency is enforced. A forward reference is wrong
if the dependency is only hoped."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(B-0098): Codex P1 — strict-POSIX example must use only POSIX features

Codex correctly flagged that my "strict POSIX-portable"
example used bash-only features:
- [[ ]] (bash test, not POSIX)
- [[ a == *b* ]] (bash glob match, not POSIX)
- $(ls -1 ... | sort) for iteration

Replaced with strict POSIX:
- direct glob iteration `for file in pattern; do`
- `case ... in pattern) ;; esac` for glob match
- `printf` instead of `warn` (warn is shell function, not POSIX)
- redirect to stderr (>&2)

Option (a) keeps bashisms since it's labeled "GNU/BSD-common"
(works on every 4-shell target including bash + zsh on
realistic systems).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants